Mechanism for safe byte code in a tracing framework

ABSTRACT

A method for evaluating safety of a tracing program involves loading a byte code in a tracing framework, where the byte code includes instructions of the tracing program, validating the instructions when loading the byte code, performing one or more safety checks on the instructions while performing virtual machine emulation of the instructions, reporting an error condition and aborting virtual machine emulation of an unsafe instruction in the instructions when the safety check(s) detect(s) the unsafe instruction, and completing virtual machine emulation of a safe instruction in the instructions when the safety check(s) detect(s) the safe instruction, after aborting virtual machine emulation of the unsafe instruction.

BACKGROUND

A tracing framework is a collection of software routines and tools thatpermit a user to instrument and record the activity of one or moreexecuting programs, including an operating system kernel. Tracingframeworks typically permit users to describe instrumentation requestsby naming one or more probes, which are locations of interest that canbe used as data-recording sites within an instrumented program. Tracingframeworks also permit users to associate these probes with one or moreactions. The actions describe what tracing operations should beperformed when the executing instrumented program passes through theprobe site (i.e., when a probe triggers). Tracing frameworks typicallyprovide either a defined set of actions at each probe, a set ofuser-selectable actions, or the ability to execute an essentiallyarbitrary set of actions composed in a programming language (such as C,C++, or Pascal). In tracing frameworks that support a programminglanguage for describing actions, language statements are compiled intoan intermediate form or directly into machine code and are then executedwhen the probe triggers.

If the tracing framework permits instrumentation of the runningoperating system kernel itself, the instrumentation service takes thecompiled intermediate form of the tracing request and loads it into theoperating system kernel as part of enabling the correspondinginstrumentation. The instrumentation code executes as part of theoperating system kernel itself either directly on the processor orthrough a virtual machine or interpreter provided by the instrumentationservice that executes inside the operating system kernel. Because theoperating system is an essential service without which the computersystem cannot function, a tracing framework for an operating systemkernel makes provisions for safety, so an improperly constructed ormaliciously designed tracing program cannot damage the operating systemor deny service to users. If provisions for safety are not resolved, thetracing system cannot be usefully deployed in any environment where theoperating system is shared between users or performs an importantfunction.

Implementers of tracing frameworks typically ignore this problem andrely on the access control measures for the users (i.e., to only allowpersons that are sufficiently privileged or knowledgeable on thesystem), or the implementers have implemented a variety of cumbersomemechanisms to enforce security of the compiled instrumentation.

SUMMARY

In general, in one aspect, an embodiment of the invention relates to amethod for protecting a byte code in a tracing framework, comprisingvalidating a plurality of instructions when loading the byte code, andperforming at least one safety check while executing the plurality ofinstructions during a virtual machine emulation, wherein the at leastone safety check evaluates for a control transfer to an earlierinstruction in the byte code sequence.

In general, in one aspect, an embodiment of the invention relates to amechanism for protecting a byte code, comprising an instructionvalidator configured to validate a plurality of instructions whenloading the byte code, a safety check facility configured to perform atleast one safety check while executing the plurality of instructionsduring a virtual machine emulation, wherein the at least one safetycheck evaluates for a transfer to an earlier instruction in the bytecode sequence.

In general, in one aspect, an embodiment of the invention relates to acomputer system for protecting a byte code in a tracing framework,comprising a processor, a memory, a storage device, and softwareinstructions stored in the memory for enabling the computer system tovalidate a plurality of instructions when loading the byte code, andperform at least one safety check while executing the plurality ofinstructions during a virtual machine emulation, wherein the at leastone safety check evaluates for a control transfer to an earlierinstruction in the byte code sequence.

Other aspects of embodiments of the invention will be apparent from thefollowing description and the appended claims.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 shows a networked computer system in accordance with oneembodiment of the invention.

FIG. 2 shows a flow diagram for a mechanism for protecting byte code ina tracing framework in accordance with one embodiment of the invention.

FIG. 3 shows a flow chart of a method for protecting byte code in atracing framework in accordance with one embodiment of the invention.

DETAILED DESCRIPTION

Exemplary embodiments of the invention will be described with referenceto the accompanying drawings. Like items in the drawings are shown withthe same reference numbers. Further, the use of “ST” in the drawings isequivalent to the use of “Step” in the detailed description below.

In an embodiment of the invention, numerous specific details are setforth in order to provide a more thorough understanding of theinvention. However, it will be apparent to one of ordinary skill in theart that the invention may be practiced without these specific details.In other instances, well-known features have not been described indetail to avoid obscuring the invention.

An embodiment of the invention may be implemented on virtually any typeof computer regardless of the platform being used. For example, as shownin FIG. 1, a networked computer system (100) includes a processor (102),associated memory (104), a storage device (106), and numerous otherelements and functionalities typical of today's computers (not shown).The networked computer (100) may also include input means, such as akeyboard (108) and a mouse (110), and output means, such as a monitor(112). The networked computer system (100) is connected to a local areanetwork (LAN) or a wide area network via a network interface connection(not shown). Those skilled in the art will appreciate that these inputand output means may take other forms. Further, those skilled in the artwill appreciate that one or more elements of the aforementioned computer(100) may be located at a remote location and connected to the otherelements over a network.

In one embodiment, the present invention relates to a virtual machineinterpreter with safety mechanisms that permit complex and arbitraryprograms to be constructed by a compiler and encoded in an instructionset byte code. This mechanism permits validation for safety against bothdamage to the system as well as denial-of-service attacks. These designfeatures, combined with a set of run-time checks, permit arbitrarytracing programs to be compiled and loaded into the operating systemkernel where the programs can be either rejected immediately or executedsafely.

FIG. 2 shows a flow diagram for a mechanism for protecting byte code ina tracing framework in accordance with one embodiment of the invention.A tracing program (200), including tracing functions desired by a user,is provided to a compiler (202) designed to accept the tracing program(200). The compiler (202) compiles the tracing program (200) into bytecode (204) in a manner that is well-known in the art.

Byte code (204) is an instruction set that accompanies a virtual machineor program interpreter. This byte code (204) provides the same functionfor a virtualized representation of computer hardware as a standardmicroprocessor that is associated with an instruction set defining a setof binary encodings.

Once the byte code is generated, a tracing framework (206) accepts thebyte code (204) as input and begins to evaluate the byte code using amechanism, such as a virtual machine interpreter (208). This interpreter(208) includes functionality described in FIG. 3 below to define safebyte code (210) from a portion of the byte code (204). In one embodimentof the invention, all byte code (204) resides within the virtual machineinterpreter (208) where it is accessed and emulated. Once the byte code(204) is deemed safe byte code (210) it may be used by a virtualmachine.

In order to provide functionality suitable for arbitrary instrumentationcode, a virtual machine provides a set of instructions similar to thosesupported by common hardware microprocessors. Table 1 identifies a setof instructions in column 1 with a corresponding description of theinstruction (as related to virtual machines) in column 2. Alongside eachof these instruction categories, column 3 shows potential opportunitiesfor malformed or maliciously designed programs to damage the operatingsystem or deny service to system users if such instructions are executedwithout proper safety mechanisms by a virtual machine interpreter.

TABLE 1 Instruction Description Potential Danger Arithmetic A virtualmachine Several arithmetic operations cause and supports the abilityprocessor exceptions to indicate certain logical to add, subtract, errorconditions. For example, integer operations multiply, and dividedivision by zero typically results in a numbers and hardware exceptioncondition on most perform other microprocessors. common logicaloperations on them (e.g., boolean AND, OR, etc). Load A virtual machineLoad operations may be misaligned, in operations supports the abilitythat some microprocessors require that a to access memory 2-byte loadoccur on an address value locations associated that is a multiple of 2,a 4-byte load with the occur on an address value that is a instrumentedmultiple of 4, etc. If a misaligned load program (in this is attempted,the processor signals an case, the data exception. address space of theLoad operations may be attempted from operating system invalidaddresses. Modern operating kernel itself). systems use a techniquecalled virtual memory whereby the set of addresses associated with auser process or the operating system kernel are indirectly mapped to thephysical memory addresses of the computer system. The address space ofthe operating system kernel is therefore sparsely populated in that notall addresses are valid and mapped to a physical memory locationassigned to the operating system kernel. If a load from an address withno corresponding translation to a physical memory location is attempted,the processor signals an exception. Load operations may be attemptedfrom addresses that are mapped to hardware devices other than memorystorage and that have side effects when accessed, such as devicehardware programmable input/output registers. Some modern operatingsystem kernels map device control registers into the address space ofthe operating system so that they can be manipulated with load and storeinstructions. If some of these locations have side effects when loadsare attempted, a sequence of loads incompatible with the mechanisms ofthe underlying device hardware could damage or disrupt the operation ofthe device or computer system itself. Store A virtual machine Storeoperations may be misaligned in operations supports the ability the samemanner as loads and can to modify memory trigger a processor exception.Store locations associated operations may be attempted to invalid withthe tracing locations in the same manner as loads program itself. Thisand can trigger a processor exception. permits such Store operations maybe attempted to programs to create memory-mapped device hardware datastructures and registers with side effects in the same manipulate manneras loads, resulting in damage to variables. or disruption of a hardwaredevice or the system. Store operations may also be attempted to a memorylocation that is properly aligned and valid but that is associated witha part of the operating system kernel other than the storage allocatedby the virtual machine for use by the tracing program itself. If storeswere permitted to such locations, tracing programs would be able toinadvertently or deliberately damage the operating system kernel.Control A virtual machine Control transfer instructions such as transfersupports the ability those that permit resetting the virtual operationsfor the tracing machine program counter to a particular program todirect address (a “jump”) and incrementing or the virtual machinedecrementing the program counter by a to transfer control to particularamount (a “branch”) can be a different point used to transfer control toinvalid within the byte code addresses, addresses that are notinstruction stream. associated with virtual machine code, Such controland to create programs that are non- transfer operations terminating(i.e., a program that loops are required to infinitely without everreaching a implement standard program control flow endpoint).programming Illegal transfers can cause exception constructs such asconditions such as those enumerated for if-then statements loads andstores above. Infinite loops or and logical infinite recursion mean thatprogram conditions. control will never return from the virtual machineto the operating system kernel, thereby utilizing the instrumentationservice as a denial-of- service attack against other operating systemclients.

In addition to the potential dangers included in Table 1, the followingissues also need to be resolved to allow protection of byte code whenusing a virtual machine. First, if control transfer instructions arealso provided to execute a set of predefined subroutines inside thevirtual machine or instrumentation service, the program may be able tomanipulate any of these services into one of the problem areas describedabove even though the service routines are not directly implemented inthe tracing program. Second, if programs of arbitrary size arepermitted, a single linear sequence of byte code instructions of vastsize could be created that would take so long to execute, that theresult is similar to a denial-of-service attack or to the behavior of aprogram with an infinite loop or infinite recursion.

In one embodiment, a set of attributes for a safe byte code and virtualmachine interpreter is described below. These mechanisms provide forboth efficient code validation and execution. The mechanisms also allowsufficient flexibility for the implementation of a programming languagethat permits useful expression evaluation and conditional constructs foruse in a tracing framework that can be applied to an operating system.

FIG. 3 shows a flow chart of a method for protecting byte code in atracing framework in accordance with one embodiment of the invention.This protection of byte code may be implemented by performing a two-passapproach. Initially, a tracing program is obtained (Step 300) using acommand line or graphical user interface. In the first pass,instructions from the tracing program are validated during a single passat load time (Step 302). The validation pass is extensive and describedin detail below.

Next, a determination is made whether the instructions are validated assafe (Step 304). If the instructions are not validated as safe, then thetracing program is rejected (Step 306). If the instructions arevalidated, protection for the byte code is implemented. Specifically, aset of safety checks is performed prior to and while emulating thevalidated instructions (Step 308). This “emulation” pass is designed toaddress the potential dangers shown in Table 1 above, and is alsodiscussed in detail below.

Upon completion of the safety checks, a determination is made whetherthe instruction is safe (Step 310). If the instruction is not safe, anerror is reported and the emulation is aborted (Step 312). If theinstruction is safe, emulation of the instruction is completed (Step314). Next a determination is made whether additional instructionsremain (Step 316). If instructions remain, control returns to Step 308(i.e., a set of safety checks are performed on another validatedinstruction) and Steps 308-312 continue (as needed) until allinstructions have been examined. If no instructions remain, processingterminates.

During the “validation” pass (Step 302) described above and shown inFIG. 3, the following five steps are performed by an instrumentationservice for each instruction. First, a set of standard checks areperformed to validate the instruction, including verifying that the“opcode” bits (i.e., the bits that describe the instruction type) name avalid operation. If an opcode is not valid, the global tracing programis rejected.

Second, a determination is made whether any operand names referenced bythe instruction must refer to valid operands provided by the virtualmachine emulator. If an operand name is not valid, the tracing programis rejected. The term operand name, as used above, refers to a label fora set of operands in either a register-based (i.e., instructions operateon a fixed-size set of fixed-size storage locations (registers)) or astack-based model (i.e., instructions operate on a set of values pushedonto a virtualized stack of operands).

Third, any instructions that transfer control flow must be directbranches to a fixed offset or location within the tracing programinstruction stream. The destination location within the instructionstream is computed from the instruction. If it lies outside of theinstruction stream or at an instruction offset less than or equal to theoffset of the branch instruction itself, then the tracing program isrejected.

Fourth, any instructions that invoke an instrumentation servicesubroutine are checked to determine that a valid subroutine is named; ifnot, the tracing program is rejected. Lastly, a determination is madewhether the total number of instructions in the input byte code streamexceeds the configurable limit on the number of instructions. If so, thetracing program is rejected.

During the emulation pass (Step 304) described above and shown in FIG.3, the following five steps are performed. First, for any arithmeticinstruction that can result in a processor exception, the input operandsare checked for exceptional conditions and, if any are found, executionis aborted. Alternately, a mechanism is provided whereby the processorexception for an arithmetic exception can be intercepted by the virtualmachine emulator.

Second, for any load or store instruction, the effective address ischecked for appropriate alignment before issuing the underlyingmicroprocessor instructions. If the alignment is improper, execution isaborted. Alternately, a mechanism is provided whereby the processorexception for a misaligned load or store can be intercepted by thevirtual machine emulator. Third, for any load or store instruction, amechanism is provided whereby either the effective address is checkedfor validity prior to executing the load, or the processor exception foran invalid address is intercepted by the virtual machine emulator.

Next, for any load or store instruction, a mechanism is provided wherebythe effective address is checked against a list of pre-computed addressranges assigned to a memory-mapped device hardware state. If theeffective address falls within any of these ranges, emulation is abortedand no load or store instruction is issued. Lastly, for any storeinstruction, a mechanism is provided whereby the effective address ischecked against a list of pre-computed address ranges assigned by thevirtual machine to the tracing program. If the effective address doesnot fall within any of these ranges, emulation is aborted and no storeinstruction is issued.

In one embodiment of the invention, the attributes of a particular bytecode named DTrace Intermediate Format (DIF) is described below. In DIF,instructions are encoded in 32-bit words where the highest order 8-bitsare an integer naming one of the valid virtual machine opcodes. DIF alsoprovides for a fixed number of registers named using integers by thevirtual machine. When instructions refer to registers, one or moregroups of 8-bits within the remaining 24-bits are assigned to indicatethe name of each register referenced by the instruction.

In one embodiment of the invention, arithmetic instructions in DIFoperate only on values that are currently stored in virtual machineregisters. Further, load and store instructions operate on effectiveaddresses stored in a single virtual machine register. For loads, theresult of the load is placed in a register named in the instruction. Forstores, the value to be stored is first placed in a register named inthe instruction.

In one embodiment of the invention, an opcode for performing asubroutine call is provided in DIF that uses 16 of the remaining 24-bitsin the instruction word to explicitly encode an integer corresponding tothe desired subroutine. Further, a set of opcodes for performingbranches based on a typical set of integer condition codes are provided.Each branch opcode uses the remaining 24-bits of the instruction word toindicate the offset of the instruction word within the instructionstream to which control should transfer if the condition codes match thedesired branch condition.

In one embodiment, the present invention supports parallel evolution ofthe tracing framework compiler and instrumentation service. Theinvention also provides efficient transfer between the compiler and thisservice. The invention can be used uniformly in all mechanisms providedby the tracing framework for enabling or verifying instrumentation, andallows for stable, persistent storage of compiled tracing programs.

While the invention has been described with respect to a limited numberof embodiments, those skilled in the art, having benefit of thisdisclosure, will appreciate that other embodiments can be devised whichdo not depart from the scope of the invention as disclosed herein.Accordingly, the scope of the invention should be limited only by theattached claims.

1. A method for evaluating safety of a tracing program, comprising:loading a byte code in a tracing framework, wherein the byte codecomprises a plurality of instructions of the tracing program; validatingthe plurality of instructions when loading the byte code; performing atleast one safety check on the plurality of instructions while performingvirtual machine emulation of the plurality of instructions; reporting anerror condition and aborting virtual machine emulation of an unsafeinstruction in the plurality of instructions when the at least onesafety check detects the unsafe instruction; and completing virtualmachine emulation of a safe instruction in the plurality of instructionswhen the at least one safety check detects the safe instruction, afteraborting virtual machine emulation of the unsafe instruction.
 2. Themethod of claim 1, wherein the validating the plurality of instructionscomprises: verifying that opcode bits for the plurality of instructionsidentify valid operations.
 3. The method of claim 1, wherein thevalidating the plurality of instructions comprises: evaluating whetheran operand name referenced in the plurality of instructions does notrefer to a valid operand provided by the virtual machine emulation. 4.The method of claim 1, wherein the validating the plurality ofinstructions comprises: computing a destination location within aninstruction stream from one of the plurality of instructions; andevaluating whether the destination location is invalid.
 5. The method ofclaim 1, wherein the validating the plurality of instructions comprises:determining whether a subroutine name is valid, when one of theplurality of instructions invokes a named subroutine.
 6. The method ofclaim 1, wherein the validating the plurality of instructions comprises:evaluating whether a total number of the plurality of instructions inthe byte code exceeds a user-configurable limit.
 7. The method of claim1, wherein the validating occurs during a single pass over the pluralityof instructions.
 8. The method of claim 1, wherein the byte codecomprises an instruction set of a virtual machine.
 9. The method ofclaim 8, wherein the instruction set comprises at least one selectedfrom the group consisting of an arithmetic operation, a logicaloperation, a load operation, a store operation, and a control transferoperation.
 10. The method of claim 9, wherein the performing the atleast one safety check comprises: evaluating whether the arithmeticoperation results in a processor exception.
 11. The method of claim 9,wherein the performing the at least one safety check comprises:evaluating an effective address of the load operation before issuingunderlying instructions; determining an appropriate alignment for theeffective address; and determining whether the effective address isimproperly aligned based on the appropriate alignment.
 12. The method ofclaim 9, wherein the performing the at least one safety check comprises:evaluating an effective address of the store operation before issuingunderlying instructions; determining an appropriate alignment for theeffective address; and determining whether the effective address isimproperly aligned based on the appropriate alignment.
 13. The method ofclaim 9, wherein the performing the at least one safety check comprises:evaluating an effective address of an operation for validity prior toexecuting the operation, wherein the operation is one selected from thegroup consisting of the load operation and the store operation.
 14. Themethod of claim 9, wherein the performing the at least one safety checkcomprises: evaluating whether an effective address of an operation fallsoutside a list of pre-computed address ranges assigned to amemory-mapped device hardware state, wherein the operation is oneselected from the group consisting of the load operation and the storeoperation.
 15. The method of claim 9, wherein the performing the atleast one safety check comprises: evaluating whether an effectiveaddress of the store operation falls outside a list of pre-computedaddress ranges assigned by the virtual machine to the tracing program.16. A tracing framework, stored on a computer readable memory,comprising: an instruction validator configured to: load a byte codecomprising a plurality of instructions of a tracing program, andvalidate the plurality of instructions when loading the byte code; and asafety check facility configured to: perform at least one safety checkon the plurality of instructions while performing virtual machineemulation of the plurality of instructions, report an error conditionand abort virtual machine emulation of an unsafe instruction in theplurality of instructions when the at least one safety check detects theunsafe instruction, and complete virtual machine emulation of a safeinstruction in the plurality of instructions when the at least onesafety check detects the safe instruction, after aborting virtualmachine emulation of the unsafe instruction.
 17. The tracing frameworkof claim 16, wherein the instruction validator is configured to validatethe plurality of instructions by: verifying that opcode bits for theplurality of instructions identify valid operations.
 18. The tracingframework of claim 16, wherein the instruction validator is configuredto validate the plurality of instructions by: evaluating whether anoperand name referenced in the plurality of instructions does not referto a valid operand provided by the virtual machine emulation.
 19. Thetracing framework of claim 16, wherein the instruction validator isconfigured to validate the plurality of instructions by: computing adestination location within an instruction stream from one of theplurality of instructions; and evaluating whether the destinationlocation is invalid.
 20. The tracing framework of claim 16, wherein theinstruction validator is configured to validate the plurality ofinstructions by: determining whether a subroutine name is valid, whenone of the plurality of instructions invokes a named subroutine.
 21. Thetracing framework of claim 16, wherein the instruction validator isconfigured to validate the plurality of instructions by: evaluatingwhether a total number of the plurality of instructions in the byte codeexceeds a user-configurable limit.
 22. The tracing framework of claim16, wherein the instruction validator is configured to validate during asingle pass over the plurality of instructions.
 23. The tracingframework of claim 16, wherein the byte code comprises an instructionset of a virtual machine.
 24. The tracing framework of claim 16, whereinthe instruction set comprises at least one selected from the groupconsisting of an arithmetic operation, a logical operation, a loadoperation, a store operation, and a control transfer operation.
 25. Thetracing framework of claim 24, wherein the safety check facility isconfigured to perform the at least one safety check by: evaluatingwhether the arithmetic operation results in a processor exception. 26.The tracing framework of claim 24, wherein the safety check facility isconfigured to perform the at least one safety check by: evaluating aneffective address of the load operation before issuing underlyinginstructions; determining an appropriate alignment for the effectiveaddress; and determining whether the effective address is improperlyaligned based on the appropriate alignment.
 27. The tracing framework ofclaim 24, wherein the safety check facility is configured to perform theat least one safety check by: evaluating an effective address of thestore operation before issuing underlying instructions; determining anappropriate alignment for the effective address; and determining whetherthe effective address is improperly aligned based on the appropriatealignment.
 28. A computer system for evaluating safety of a tracingprogram, comprising: a processor; a memory; a storage device; andsoftware instructions stored in the memory for enabling the computersystem to: load a byte code in a tracing framework, wherein the bytecode comprises a plurality of instructions of the tracing program;validate the plurality of instructions when loading the byte code;perform at least one safety check on the plurality of instructions whileperforming virtual machine emulation of the plurality of instructions;and report an error condition and abort virtual machine emulation of anunsafe instruction in the plurality of instructions when the at leastone safety check detects the unsafe instruction, and complete virtualmachine emulation of a safe instruction in the plurality of instructionswhen the at least one safety check detects the safe instruction, afteraborting virtual machine emulation of the unsafe instruction.